Proposed Formula Based on Study of Correlation between Hub and Spoke Architecture and Bus Architecture in Data Warehouse Architecture, Based on Distinct Parameters

Rajdeep Chowdhury*, Bikramjit Pal and Saikat Ghosh

Department of Computer Application, JIS College of Engineering Block ‘A’, Phase III, Kalyani, Nadia-741235, West Bengal, India

ABSTRACT:

Data warehousing has evolved with every passing decade and it has come a long way from its inception and the modern era has made it an adequate part of pre-existing analytical methodologies. In the present status, data warehousing has evolved into a system which is capable of furnishing key performance metrics to high-level management, ensuring capability of analytical strength to middle-level management and aligning to the ability of providing corrective data to-and-fro back to low-level based on the basis of information derived from the analytical system. The data warehouse market is currently triggered by business-driven solutions focussing on domain specific challenges and its allied histrionics that have conjured up the very basic nuances of data warehousing. The present business idologies of the global village have cropped up innovative and tougher challenges for the data warehouse designers and architects to ensemble a bigger and much better innovation. Although, there are numerous methods available in the global market to cope up to this stiff challenges, but the evolutions have not made much of an impact in the global arena and the competitive market has prompted to venture into the unseen horizons over and over again. Data warehouses are designed to facilitate reporting and analysis. The said characteristic of the data warehouse mainly focuses on the data storage and acts much like a buffer to absorb continuous stock of data which gets processed via numerous iterative steps to evolve into information, awaited by all tiers of an organization for various decision-making processes. For over a decade, discussions and even controversies have lingered about which of the existing architectures is the best data warehouse architecture. The two “giants” of the data warehousing field, Bill Inmon and Ralph Kimball, are at the heart of disagreement. Inmon advocates the Hub & Spoke architecture (for example, the Corporate Information Factory), while Kimball promotes the data mart Bus architecture with conformed dimensions. There are other architecture alternatives, but these two options are fundamentally different approaches, and each has strong advocates via implementation.

KEYWORDS: Data Warehouse, Hub and Spoke Architecture, Bus Architecture, Business Intelligence, Federated and Data Mart, Repetition Constant, Propagation Constant.

INTRODUCTION:

Data warehouse is a well-established repository or storage place of an organization's electronically stored data. [2] [3] Data warehouses are designed to facilitate comprehensive reporting and minute / thorough analysis. [3]

Data warehouse architectures that exist and are widely used in the industry are mainly fragmented into five types, namely; [4]

a) Independent data mart

b) Data mart Bus architecture

c) Hub and Spoke architecture

d) Centralized data warehouse (no dependent data marts)

e) Federated

A web-based survey has been conducted to collect data regarding the performance of each of the above architecture. The survey included questions about the respondent, the respondent’s company, the company’s data warehouse and the success of the data warehouse architecture. The positions of the respondents were distributed relatively evenly among data warehouse managers, data warehouse staff members, IS managers and independent consultants/system integrators.

CONCEPTUAL LITERATURE REVIEW:

In order to fully understand the impact of data warehouse in an industrial scenario, it is important to first take a back-seat and ensure what data warehouse is by having feasibility study and evaluating various scenarios, and why it should be implemented and from what perspective it will be accountable? Lastly, the conceptual literature review will focus on how the distinct architectures are going to be correlated via the proposed formula in the modern trends of an industrial scenario, keeping in mind of both the subject world and the usage world. The implementation of the proposed formula has been instantiated.

SUCCESS OF THE ARCHITECTURE:

Four measures were used to assess the success of the architectures: (1) information quality, (2) system quality, (3) individual impacts, and, (4) organizational impacts. The questions used a seven-point scale, with the higher score indicating a more successful architecture. Figure below shows the average scores for the measures across the architectures.

Independent data marts scored the lowest on all measures. This finding confirms the conventional wisdom that independent data marts are a poor architectural solution.

Next lowest on all measures was the federated architecture. Firms sometimes have desperate decision-support platforms resulting from mergers and acquisitions, and they may choose a federated approach, at least in the shorter run. The findings suggest that the federated architecture is not an optimal long-term solution.

	Independent Data Mart [1]	Bus Architecture [2]	Hub and Spoke Architecture [3]	Centralized (No Dependent Data Marts) [4]	Federated [5]
Information Quality	4.42	5.16	5.35	5.23	4.73
System Quality	4.59	5.60	5.56	5.41	4.69
Individual Impact	5.08	5.80	5.62	5.64	5.15
Organizational Impact	4.66	5.34	5.24	5.30	4.77

Correlation between HUB and Spoke Architecture and Bus Architecture

	Hub and Spoke Architecture [x]	Bus Architecture [y]	x*y	x²	y²
Information Quality	5.35	5.16	27.606	28.6225	26.6256
System Quality	5.56	5.60	31.136	30.9136	31.36
Individual Impact	5.62	5.80	32.596	31.5844	33.64
Organizational Impact	5.24	5.34	27.9816	27.4576	28.5156
Total	21.77	21.90	119.3196	118.5781	120.1412

The Coefficient of correlation is defined as:-

Σ xy – ((Σx Σy) /N)

R = ————————————————— ………………………… (i)

√ (Σx² – ((Σx)²/N)) (Σy² – ((Σy)² /N))

Value of R calculated using the data given above is, R = 0.8562

Correlation between all five Data Marts implementation via case study

	Independent Data Mart [1]	Bus Architecture [2]	Hub and Spoke Architecture [3]	Centralized (No Dependent Data Marts) [4]	Federated [5]
Information Quality	4.42	5.16	5.35	5.23	4.73
System Quality	4.59	5.60	5.56	5.41	4.69
Individual Impact	5.08	5.80	5.62	5.64	5.15
Organizational Impact	4.66	5.34	5.24	5.30	4.77

PROPOSED FORMULA: -

Σ x_i_cy_j_d – ((Σx_i_c. Σy_j_d) /N)

√ (Σx_i_c² – ((Σx_i_c)²/N)) (Σy_j_d² – ((Σy_j_d)²/N))

Where, “c” stands for the Repetition Constant for “i”, and , “d” stands for the Propagation Constant for “j”

· c = 1 to n-1 (where n is the number of variables)

· d = 2+m to n (where m increases till c n-1)

r = f / ⁿC₂ (where n is the number of variables)

WORKING:

As we have 5 different types of Data Marts, then value of ⁿC₂will be:-

⁵C₂ à 10

For 10 different combinations, we will have the following:-

f₁₂, f₁₃, f₁₄, f₁₅, f₂₃, f₂₄, f₂₅, f₃₄, f₃₅, f₄₅

Values of the above stated combinations would adhere to:-

f₁₂ = 0.8569, f₁₃ = 0.5917, f₁₄ = 0.9375, f₁₅ = 0.9377, f₂₃ = 0.8562,

f₂₄ = 0.9639, f₂₅ = 0.7011, f₃₄ = 0.8334, f₃₅ = 0.5446, f₄₅ = 0.8613

Thus, the value of “f” is:

f = (f₁₂+ f₁₃+ f₁₄+ f₁₅+ f₂₃+ f₂₄+ f₂₅+ f₃₄+ f₃₅+ f₄₅) f = 8.0843

Therefore, the final value of “r” is:

As, r = f / ⁿC₂ r = 0.80843

CONCLUSION:

At the end of the evolution, we have come to the conclusion that the data warehouse and the data mart have a co-existing relationship by adhering to the user analysis and reporting methodology, designed and conceptualized from user perspective and which has very prominent practical application in the real world.

The result shows that the hub & spoke architecture and bus architecture are very closely correlated in terms of all the distinct parameters. This finding helps in explaining why these competing architectures have survived over time and periodic turmoil. They are equally successful for their intended purposes and are seemingly adhering to individuals. In terms of information quality, system quality, individual impact and organizational impact, no single architecture is dominant and does not pose much of superiority over the competing one.

Similarly, we can find the correlation among other architectures, which will show their existence in the industry. In some ways, the architectures have evolved over time and become more similar. Even the development methodologies (for example; top down methodology for the hub & spoke architecture and centralized architecture and life cycle or bottom up methodology for the bus architecture) have evolved and become more similar.

REFERENCES:

1. Data warehousing and OLAP: A research-oriented bibliography by A. Mendelzon, C. Hurtado and D. Lemire

2. B. Pal, “Comparison of Data Warehouse Architecture Based on Data Model”, International Journal of Information Technology & Knowledge Management, July-December 2010, Volume 2, Number 2, pp. 303-304

3. R. Chowdhury, B. Pal, “Proposed Hybrid Data Warehouse Architecture Based on Data Model”, International Journal of Computer Science & Communication, July-December 2010, Volume 1, Number 2, pp. 211-213

4. http://en.wikipedia.org/wiki/Data_warehouse

5. http://db.stanford.edu/pub/papers/warehouse-research

Received on 03.04.2011

Modified on 12.04.2011

Accepted on 17.04.2011

Research J. Science and Tech. 3(3): May-June. 2011: 154-157